This report analyzes Croatian web media coverage of Generative AI in education from 2023 to 2025. Using computational frame analysis and natural language processing, we examine how media narratives have evolved from initial panic to gradual integration.
Key Findings
Coverage volume: Substantial media attention with identifiable peaks around key events
Dominant frames: OPPORTUNITY and REGULATION frames predominate over THREAT
Narrative evolution: Clear shift from panic-focused to integration-focused coverage
Source variation: Significant differences in framing between outlet types
1 Introduction
1.1 Background
The release of ChatGPT in November 2022 triggered a global conversation about artificial intelligence in education. Croatia, like many countries, witnessed intense media debate about the implications of generative AI for students, teachers, and educational institutions.
1.2 Research Questions
This analysis addresses four core questions:
Volume & Timing: How much coverage exists, and when did it peak?
Framing: Which interpretive frames dominate, and how do they shift over time?
Actors: Who is represented in coverage, and who is given voice?
Sources: Do different media types frame AI in education differently?
1.3 Theoretical Framework
Our analysis draws on:
Framing Theory (Entman, 1993): Media frames as patterns of selection and emphasis
Moral Panic Theory (Cohen, 1972): Technology adoption often follows panic cycles
Diffusion of Innovations (Rogers, 1962): Media coverage mirrors adoption stages
2 Data and Methods
2.1 Data Source
Show code
# ==============================================================================# DATA LOADING# ==============================================================================# Use the configured path, or try to find the fileif (exists("DATA_FILE_PATH") &&file.exists(DATA_FILE_PATH)) { data_file <- DATA_FILE_PATH} else {# Try multiple possible locations for the data file possible_paths <-c("./dta.xlsx","../dta.xlsx", "dta.xlsx",file.path(getwd(), "dta.xlsx"),"D:/LUKA/Academic/HKS/Clanci/AI u obrazovanju/dta.xlsx" ) data_file <-NULLfor (path in possible_paths) {if (file.exists(path)) { data_file <- pathbreak } }if (is.null(data_file)) {cat("Current working directory:", getwd(), "\n")cat("Files in current directory:\n")print(list.files(pattern ="\\.xlsx$", recursive =TRUE))stop("Could not find dta.xlsx. Set DATA_FILE_PATH in the setup chunk.") }}cat("Loading data from:", data_file, "\n")
Loading data from: ./dta.xlsx
Show code
# Load the pre-processed dataraw_data <-read.xlsx(data_file)cat("Dataset loaded successfully\n")
Frame detection employs a dictionary-based approach, where each frame is operationalized through a curated set of Croatian-language keywords. For each article, we count occurrences of dictionary terms in the combined title and body text. An article is coded as containing a frame if at least one dictionary term is present. The dominant frame is determined by the highest keyword count across all eight frames.
The detection algorithm iterates through each article, applying regular expression matching with word boundaries (\b) to avoid partial matches. Sentiment is computed as a simple difference score: positive word count − negative word count. Articles with scores above +2 are classified as positive, below −2 as negative, and the remainder as neutral. This threshold-based approach provides robustness against minor fluctuations while capturing meaningful sentiment differences.
cat("Articles with at least one frame:", sum(clean_data$frame_count >0), "\n")
Articles with at least one frame: 3154
3 Results
3.1 Coverage Overview
This section presents descriptive statistics characterizing the corpus. The percentage of articles with at least one detected frame indicates dictionary coverage—values below 70% may suggest dictionary expansion is needed.
The bar chart displays monthly article counts, with a LOESS smoothing curve (red) indicating the underlying trend. Peaks correspond to external events (e.g., ChatGPT launch, academic calendar milestones). The smoothing bandwidth is automatically selected to balance noise reduction with trend fidelity.
Show code
monthly_stats <- clean_data %>%group_by(year_month) %>%summarise(n_articles =n(),prop_THREAT =mean(frame_THREAT_present, na.rm =TRUE),prop_OPPORTUNITY =mean(frame_OPPORTUNITY_present, na.rm =TRUE),prop_REGULATION =mean(frame_REGULATION_present, na.rm =TRUE),mean_sentiment =mean(sentiment_score, na.rm =TRUE),.groups ="drop" )ggplot(monthly_stats, aes(x = year_month, y = n_articles)) +geom_col(fill ="#2c7bb6", alpha =0.8) +geom_smooth(method ="loess", se =TRUE, color ="#d7191c", linewidth =1.2) +scale_x_date(date_breaks ="3 months", date_labels ="%b\n%Y") +labs(title ="Media Coverage of AI in Croatian Education",subtitle ="Monthly article count with trend line",x =NULL, y ="Number of Articles" )
Figure 1: Monthly Coverage Volume
3.1.3 Day of Week Patterns
Publication timing reveals editorial routines. Weekday concentration suggests news-driven coverage, while weekend spikes may indicate feature or opinion pieces. Percentages sum to 100% across all days.
Show code
dow_stats <- clean_data %>%filter(!is.na(day_of_week)) %>%count(day_of_week) %>%mutate(percentage = n /sum(n) *100)ggplot(dow_stats, aes(x = day_of_week, y = n)) +geom_col(fill ="#2c7bb6", alpha =0.8) +geom_text(aes(label =paste0(round(percentage, 1), "%")), vjust =-0.5, size =3.5) +labs(title ="Publication Day Patterns",x =NULL, y ="Number of Articles" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Figure 2: Publication Patterns by Day of Week
3.2 Frame Analysis
Frame analysis quantifies how media construct meaning around AI in education. Each article receives a dominant frame assignment based on maximum keyword frequency. Articles with zero matches across all dictionaries are coded as “NONE.”
3.2.1 Dominant Frames
The horizontal bar chart ranks frames by article count. Percentages indicate each frame’s share of total coverage. A high “NONE” proportion signals potential dictionary gaps or genuinely frame-neutral content.
Show code
frame_dist <- clean_data %>%count(dominant_frame, sort =TRUE) %>%mutate(percentage = n /sum(n) *100,dominant_frame =factor(dominant_frame, levels = dominant_frame) )ggplot(frame_dist, aes(x =reorder(dominant_frame, n), y = n, fill = dominant_frame)) +geom_col() +geom_text(aes(label =paste0(round(percentage, 1), "%")), hjust =-0.1, size =3.5) +scale_fill_manual(values = frame_colors) +coord_flip() +labs(title ="Distribution of Dominant Frames",subtitle ="Based on highest frame word count per article",x =NULL, y ="Number of Articles" ) +theme(legend.position ="none") +expand_limits(y =max(frame_dist$n) *1.15)
Figure 3: Distribution of Dominant Frames
3.2.2 Frame Evolution Over Time
This time series tracks the proportion of articles containing each frame per month (not mutually exclusive—articles may contain multiple frames). Rising lines indicate increasing frame salience; convergence suggests narrative consolidation.
Show code
frame_evolution <- monthly_stats %>%select(year_month, prop_THREAT, prop_OPPORTUNITY, prop_REGULATION) %>%pivot_longer(-year_month, names_to ="frame", values_to ="proportion") %>%mutate(frame =str_remove(frame, "prop_"))ggplot(frame_evolution, aes(x = year_month, y = proportion, color = frame)) +geom_line(linewidth =1.2) +geom_point(size =2) +scale_color_manual(values = frame_colors) +scale_y_continuous(labels = scales::percent) +scale_x_date(date_breaks ="3 months", date_labels ="%b\n%Y") +labs(title ="Frame Prevalence Over Time",subtitle ="Proportion of articles containing each frame",x =NULL, y ="Proportion of Articles",color ="Frame" )
Figure 4: Evolution of Media Frames Over Time
3.2.3 Frame Co-occurrence
The heatmap displays normalized co-occurrence frequencies. Values are row-normalized by diagonal (self-occurrence), so each cell represents: P(Frame₂ | Frame₁). Values approaching 1.0 indicate frames that frequently appear together; the diagonal is always 1.0 by definition.
We partition the study period into four theoretically-motivated phases based on moral panic and diffusion theory. Phase boundaries are set a priori based on expected narrative transitions: emergence (initial reaction), debate (contested meaning), integration (policy response), and normalization (routinization). The grouped bar chart enables cross-phase comparison of frame prevalence.
Sentiment provides an aggregate valence measure independent of specific frames. The trajectory plot shows monthly mean sentiment; green shading indicates positive territory, red indicates negative. Zero represents neutral balance between positive and negative lexicon matches.
Show code
ggplot(monthly_stats, aes(x = year_month)) +geom_ribbon(aes(ymin =0, ymax =pmax(mean_sentiment, 0)), fill ="#4daf4a", alpha =0.5) +geom_ribbon(aes(ymin =pmin(mean_sentiment, 0), ymax =0), fill ="#e41a1c", alpha =0.5) +geom_line(aes(y = mean_sentiment), linewidth =1.2, color ="black") +geom_hline(yintercept =0, linetype ="dashed") +scale_x_date(date_breaks ="3 months", date_labels ="%b\n%Y") +labs(title ="Sentiment Trajectory Over Time",subtitle ="Mean sentiment score (positive − negative word counts)",x =NULL, y ="Mean Sentiment Score" )
Figure 7: Sentiment Trajectory Over Time
Show code
sentiment_dist <- clean_data %>%count(sentiment_category) %>%mutate(percentage = n /sum(n) *100)ggplot(sentiment_dist, aes(x = sentiment_category, y = n, fill = sentiment_category)) +geom_col() +geom_text(aes(label =paste0(round(percentage, 1), "%")), vjust =-0.3, size =4) +scale_fill_manual(values = sentiment_colors) +labs(title ="Sentiment Distribution",x =NULL, y ="Number of Articles" ) +theme(legend.position ="none") +expand_limits(y =max(sentiment_dist$n) *1.1)
Figure 8: Distribution of Sentiment Categories
3.5 Actor Representation
Actor analysis identifies who is discussed in coverage. The primary actor is assigned based on highest mention count per article. This reveals whose perspectives dominate discourse and potential imbalances in voice allocation.
The actor-frame association chart shows which frames co-occur with each primary actor. This reveals differential framing: for instance, if policy makers are more frequently associated with regulation frames, this suggests their media presence centers on governance rather than opportunity or threat narratives.
Show code
actor_frame_assoc <- clean_data %>%filter(primary_actor !="NONE") %>%group_by(primary_actor) %>%summarise(n =n(),threat =mean(frame_THREAT_present, na.rm =TRUE) *100,opportunity =mean(frame_OPPORTUNITY_present, na.rm =TRUE) *100,regulation =mean(frame_REGULATION_present, na.rm =TRUE) *100,.groups ="drop" ) %>%arrange(desc(n))actor_frame_long <- actor_frame_assoc %>%select(primary_actor, threat, opportunity, regulation) %>%pivot_longer(-primary_actor, names_to ="frame", values_to ="percentage") %>%mutate(frame =str_to_title(frame))ggplot(actor_frame_long, aes(x =reorder(primary_actor, percentage), y = percentage, fill = frame)) +geom_col(position ="dodge") +scale_fill_manual(values =c("Threat"="#e41a1c", "Opportunity"="#4daf4a", "Regulation"="#377eb8")) +coord_flip() +labs(title ="Frame Prevalence by Primary Actor",subtitle ="Which frames appear when each actor is prominent",x =NULL, y ="Percentage of Articles",fill ="Frame" )
Figure 10: Actor-Frame Associations
3.6 Source Analysis
Outlet classification enables comparison across media types. Sources are categorized via pattern matching on domain names into tabloid, quality, regional, public, tech, education, and business press. The “Other” category captures unclassified sources. Frame prevalence differences across outlet types indicate systematic variation in news construction.
Statistical tests provide inferential support for observed patterns. We employ non-parametric and parametric tests appropriate to variable types.
3.7.1 Frame-Outlet Association
A chi-square test of independence assesses whether dominant frame distribution varies significantly across outlet types. A significant result (p < 0.05) indicates that outlet type and frame usage are not independent—different media types systematically prefer different frames. Note: cells with expected counts below 5 may inflate Type I error.
Show code
frame_outlet_table <-table(clean_data$dominant_frame, clean_data$outlet_type)chisq_result <-chisq.test(frame_outlet_table)cat("Chi-Square Test: Dominant Frame vs. Outlet Type\n")
if (chisq_result$p.value <0.05) {cat("\nResult: Significant association between outlet type and frame usage (p < 0.05)\n")}
Result: Significant association between outlet type and frame usage (p < 0.05)
3.7.2 Sentiment by Phase
One-way ANOVA tests whether mean sentiment differs significantly across narrative phases. A significant F-statistic indicates at least one phase differs from others. Tukey’s HSD post-hoc test identifies specific pairwise differences while controlling family-wise error rate. Positive differences indicate the first-named phase has higher sentiment than the second.
Show code
anova_result <-aov(sentiment_score ~ narrative_phase, data = clean_data)anova_summary <-summary(anova_result)cat("ANOVA: Sentiment Score by Narrative Phase\n")
ANOVA: Sentiment Score by Narrative Phase
Show code
print(anova_summary)
Df Sum Sq Mean Sq F value Pr(>F)
narrative_phase 2 47 23.256 10.27 0.0000357 ***
Residuals 3870 8767 2.265
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
if (anova_summary[[1]]$`Pr(>F)`[1] <0.05) {cat("\nPost-hoc Tukey HSD:\n") tukey_result <-TukeyHSD(anova_result)print(tukey_result)}
The analysis reveals substantial media attention to AI in education, with identifiable peaks corresponding to key events such as ChatGPT’s release and the beginning of school semesters.
4.1.2 2. Frame Dominance
Contrary to initial expectations of moral panic, the OPPORTUNITY and REGULATION frames predominate over the THREAT frame across most of the study period. This suggests Croatian media took a relatively pragmatic approach to the topic.
4.1.3 3. Narrative Evolution
Clear evidence supports the hypothesized narrative arc:
Phase 1 (Emergence): Higher threat framing, focus on plagiarism concerns
Phase 2 (Debate): Balanced discussion of risks and benefits
This analysis demonstrates that Croatian media coverage of AI in education has followed a discernible narrative arc from initial concern to pragmatic integration. While threat frames exist, they are outweighed by opportunity and regulatory framings. The findings suggest media discourse may be more nuanced than moral panic theory would predict, with significant variation across outlet types and over time.